DOCQREMT 



EO 1B6 «69 

AOTHOR 
TITLE 



, INSTITUTION 

POB DATE 
NOTE 

EDRS PRICE 
DESCRIPTDRS 



IDENTIFIERS 



Tfl BOO 17i» 

Engelhardt, David F. 

Motivation and Test-wiseness. Director's Handbooks 
Topics in Testinq, Measurement, and Evaluation, 
Voluroe V, Fall 1979, 

New Jersey State Dept. of Education, Trenton. Div, of 
Operations^. Research, and Evaluation. 

13p. . , 

MF01/PC01 Plus Postaqe. 

♦Educational Testing: Elementary Secondary Education; 
♦Guessina (Tests): Guidelines: Student Attitudes; 
♦Student Motivation: Teacher Attitudes; Testing 
Problem's: *Test .Wi sen ess 
New Jerse<y 



ABSTRACT 

' Because motivational factors and test wiseness can 
contaminate testing used for needs assessments or evaluation 
purposes, techniques for increasing student and teacher motivation 
are dlscusised. Guidelines concerning 'guessing are also presented* 
«hile guessing' is encouraged on program evaluation or screening 
tests, it is not advocated^ for diagnostic tests: thus, different 
scoring formulas may be appropriate fcr different applications of 
testing. Instruction desianed to increase student motivation and test 
wiseness is described, and is said to be potentially fruitful whan 
severa-1 situations are cor.sidered: the student's will to win, 
feelings of individual pow^Tl essness, withdrawal due to previous 
failure, purposeless testing, and teachers' negative attitudes. 
Suggestions to combat these problems include feedback to students and 
parents. Advice is provided concernina several, issues in a test 
wiseness program: when students should guess: strategies for 
norm-referenced, criterion-referenced, screen inq, and diagnostic 
tests; and scoring formulas. It is concluded that students must be 
taught to pace their testira •'■Ime, that item construction cues can be 
taught, and that practice is u^^eful. (GDC) 
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MOTIVATION AND TEST-WISENESS 
Davxd F. Engelhardl 



Introduction 



This paper cites instances where two variahles, motivation and test- 
wiseness, can contaminate needs assessment and evahiation studies when 
usinp the types of tests most often g:iven by (tistriets in th(^ 1'&E process. Al- 
though tqsts claim to m^^asurc attainment of skills labeled in th<^ item specifica- 
tions, the scores often reflect variations in motivation or test -wiseness (these 
variables contribute to both invalidity and un reliableness of data). 

The author suf^ests some methods to increase motivation of students by 
^^^rally incr'^asing the chances of rewardinji students and teachers. The 
lea(:mu}iof test-wisenujis and the practice of test-wise behavior is advocated. 
/\ caution^ainst puessinp is i\ for diagnostic tests, while guessing is ad- 
vocated on tests used for program evaluationor sc reening; (needs assessment). 
If a test is used for both fliagnostic and program evaluation purposes, two 
separate scoring procedures could be used. 

Recommendations yre given with the spirit of reducing misclassifications 
of able students as !)eing in need of basic skill remediation, incrtasing the 
validity of m(\isurement, and increasing reliability of test scores. If students 
or non-random groups are being compar^uL as is done by scores on norm- 
r<'ference<l tests or is done witli New jersey Educational Assessment tests, 
tire elimination of contaminate measures of .test-wiseness is a<ivocated as 
an important goal. To forget rullural or psyclu)logical differenc(\s in motiva- 
tion and test-wiseness ma\ lead to gross iiK^fficiency in mir remedial and 
preventive programs as \vt*ll as incorrect evaluative ronrlusions. 

Why don't test scores respond more easily to instructional effort? dan 
our kids really lack so many basic skills^ Questions such as these can be asked 
of test results achi(Aed l)y some public scliools as w( II as of certain program 
areas (e.g. grammar) in the (*iirri< (da of some elite, private schools.' The 
questions can apply to results prodiic(Ml through norm-ref<T<*nc<Ml or stan<lard- 
m\\ criterion-refiTcnced testing. As part of a manag^MUcrrt t<*am, test coor- 
dinators ruminate ov(T their assessmei>ts an<l evaluations with (*yes of an 
examiner, .tatislician and decision-maker. hd\ing doiu* tfiis myself for S(*vcral 
years, 1 began looking at the test from the (Aaminees* starulpoint ami concluded, 
in part, that v\et 
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(1) Uai^e. failed lo motivate or reward many students 
so tkey desire to perform to the best of their abilities, 
and 

(2) Have often failed to improve the ability of 
students to demonstrate their skills on tests. , 

The validity of our testing; operations is afiVeted by 
the above characteristics of the examinees. 

This paper deals with obsi^rvations of good and 
poor motivation and the need for test-takiii}: in- 
struetton which leads to what some call test-wiseness. 
Sugg(\stions and references for remedial aelion arr 
given so that a local school distrietM-an laun< li il»s own 
student motivation proptam if desired. Several sc hool 
distriels and at least one stale have hcfrun to develop 
test-wiseness in their students, often under IIh^ rubric 
of traelu!!}! study-skills which have lifetime benefit. 
These school districts include PhiladelphiaiVVashinfiton. 

(Ibicajio: Dade Canity. Florida: and Mont- 
ponuT) (iontity, Maryland. 1 urn aware only of two 
film sirip/casscltc profn^ams that a^Jtlress test-takin«r 
stiatcfiics ((ini»la!i( c Associates of Plcasantvillc, N.J. 
and l.aniport .ct al.. 1976). It is r(M-ommen<led that 
these projiranisbe previewed before purchasin»r. Books 
or sxllabi n^*omtneuded for use with the Maryhuid 
pro}rram or oiber> are: Hnui<r ( 1973), Hook (1967), 
Huff (1961), Joujjsma (1975), Millman and Pauk 
0969). and Slaktcr (t. al. (1979). AddilionalK. 
Kricksoii (1972), Kord (1973), and Millman el. aL 
( 196S) outlifu' ihe frafnew;^ork around which a frviflul 
tesi wiseuess pro};rani could he con.-strucic.d h)call\. 

The Will To Win 

0^ course, such le>(-w i>(»iu's> pro^jraiu^ lia\c 
lillle \^orth uul(^'^^ slnd(Mil> \vi>li In do well. Sonic 
score adjuslituMils up to chance h'\cl of sucress. caiv 
l)e made To parlially counler [)0()r uioli\alion. Such 
' adjustments discusscMl hiler iic|niill\ sinuihUe lesl- 

wise (K|'ha\ior. 1 have seen lcsl-uis(» >eninr>. in au c\- 
cellent/ pri\al(» school fail lo perforru on a \li>sonri 
Collejrf Knjilish Tcsl. even wX^h (he he;nhna>lci urjiinu 
sludcfils lo do their best. This beha\i(»r was 'Ahihilcd 
even fh(uifih sludc^ils knew ibc lica(huas((»r was 
*desirin" lo IciiitiinalcU cvalualc ihc scluxdV urw 
Mechanics of Knjili'^li Projiram. lVo|)l(Mn> ot (dilaiuinji 
Ihc bcsl efforls of an\ scho(d*< slndcnls secern lo l)e- 
eonu* more frequfut as «iradc h^vei increases. Wilh 
proper irachcr allihuh* in llu^ primarx «rrade>. s(iuli'n(> 
ol'leu (M*icrl\ auail Ihe lesi alnH)sl as a name 
Kvidencc of this (Nitierncss is pMicrally la( kiu<! iu 
sccoudar\ studenis aceordin*i lo in\ e\j)erieiice. Th<T( 
is a chauee thai lesl-w iscncs> insfruclional in!it> nia\ 
prove lo interesi S(Mnr rrbellicnis sludcuts wlio ai(* 
inlrifiued hv ihe idea of I'.ealin;^ a <\sfern which ha> 
here toll >re "turiicd-1 hern -off.** Nonclhcles<. rnoliva- 
lion should iUid ra«i be addressed ouUidc of ibc le>l- 



wiseness unit. 

One mighi conclude, students invariably must .^a 
see a reward m the test** output for themselves 
before mustering all their test-taking energic's and 
wiseness. het us consider a fe<v situations affecting 
motivation: 

Feelings of Powerlessness 

ICspe< ially prevalent in (children of lower socio- 
economic class is the feeling that no matter what they 
do, they liave no power over what happens to them. 
Being subjeet to what appeal^ as a capricnousiinyj'on- 
menl. suif'h studerttsjfopposcirf to many middmelass 
c hildren, do not si?em to develop the attitude that ^ 
(rffort leads I0 success and eventually to better 
things. Kis^nberg (1967) points out that middle - 
( lass children find reward within a test, feeling that 
progress in scores is tli(^ path to success. A professor 
of elementary science education said to me once, 
'I^Thaps the best reason to teach elementary science 
is to show some children that their is order to our 
world, and that through their mind and acticms they 
(ran control part, of their envircniment." 

A ndated and hopefully niore rare phenomenon 
raw be encountered "when a student feels his/her 
destiny is predetermined in a favorable sense, often' 
by social l)ia?r (m)t alulity). When abiUty is not a 
determiner of destinv, the student Jejits on a 
''birlhright" to reacdi liis/luT goal. Sucdi an attitude 
affects learning, l)ut test laking situations may be 
even more sensitive. Since this attitude may not 
affect a ^ong period of learning as strongly as a ^ 
( (mccutraliML sensitive rccpu'st for demonstration of 
skills^ tlic demonstration of skills may be more 
x riously hampercMl by the non-competitive attitude. 
Therefore, the tesi of skills will not reveal ihr l(\iriiing 
which has tak(Mi place despite [\\r uoucbidange of the 
sludent. Some solutions to this inight hr to: 

« 

(1) Mnkv Irsts morv intorpstin^. 

(2) (jmiinci* Ihv sUidvnt of a c >m/>c////re world. 
Kstahlish a sclf-romiH*tilivvnvss. an attitudv ofton 
adophul hv star utholctos. 

WitlTdrawal Due to Prc^vioiis Failure 

Kiscuberg( I %7.) confirms whal many educators 
have seen iu <dder children, if children \\\cr\ with 
rc^pcMlcd failure*, it. is nuich more rewarding for (hem 
U(»( lo compete at all. Man\ children, especially 
minoritv slodcnls. withdraw from a tcslinj^ situation ^ 

fa(c*; not to try and fail witliout Irxiug 
>(M»rns bcder (ban failing dcspile oric's bcsl efforls. 
I tiforlunalcK . ihc rbild docs uol di>ccruinuly ehoose 
where lie/>hc niijihl succeed it c'f tort w ere mnslcred. 
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Eisenbcrg found tikt lower«cIas8 children were likely ^ 
to give any answer that would end the testing, re* 
.tfirdless of whether the answer was right or wrong* 

Tlie child perceives that failure to exert effort 
taints with vagueness any conclusions as to ability, 
thcrcl)y saving iFace by4iaving lowered the confidence 
of statements about the examinee. Generalizing to our 
situations, we miglit say that the child has worked 
toward minimum loss under avoidance strategies. 
The program's evaluation therefore suffers through 
measurements of low confidence. 

The solution seems obvioiis: in areas of basic 
skills and ^ome special talent areas, ^ve the child 
a taste of success. Appropriate level (out-of-f(rade) 
testing may help, but most intpbrtant is the prior 
classroom experience. It may be possible to entourage 
students to engage in test activities by giving them 
successfuland recent experiences with material similar 
to the forthcoming test. 

Purposeless testing, and teacher attitude: 

Who in testing has not at sometime heard the 
complaints of teachers when examination time ap- 
proaches? Students arc quick to absorb the sense of 
purposeless tesl;taking where results arc rarely used by 
the teachcr/ncver shared with students."" It is well 
recognized by many test coordinators that teachers' 
attitudes and overt concerns regarding test results arc 
major incentive factors in student performance. Qf 
course, the purposes and consequences of the t.estiiig 
should be explained before testing not as an aftc^r- 
thought, which would have little eftect on test-taking 
strategy. 

Convincing teac hers and sludnils of the^urpose 
served by a test is not a small task, r urthermore, tlic 
process may backfire in special circumstanecs where 
program evaluation or screc^ning is a major purpose: 
It IS possible thai a vengeful student may ea[)italize 
on sueli knowledge to attack a t(*aeher, principal, or 
tile system. In another situation, revealing the purpose 
of Title I teslinji in a suburban Nj'w York area seliool 
system lowered test scores because children wanted 
to qualil'x for the special suinrner [iroj^rain which 
invoUed field trips, frames, and reading skill in- 
struction. Since New jerscv has [)ro\idc(l pro«rrams 
for nnder-a<ine\ inij gifted or talenttMl studvnts. it 
wonldtrt be surprising'lf some gifted students mi^jht 
>ire to qualify for ;iiftcd and taltMited compen- 
sator) education profiTiuns hv seorin^r low on achieve- 
nit lit tests. Such attempts can dampened b\ u<u\\i 
other criteria to vcrifv srrerniiig test scores. 

\s lon^ as (I s\>tem has health) r(*lation> 
between >laff and stutlents. and attractive pro^iram 
alternatives, explaining purpose> of testinfrand nuikinji 
the test purposeful at stud<'nt and teaclicT levc^ls 
will probabi) increase scores. Some suijjicstions are: 
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(1) Make sure teachers receive all interpretative 
manuals and order timt^ving reports from 
e^mouteriMed icoiing tervices. Don't expect 
Jiana iattying. 

(2) Involve staff in the selection of tests and provide 
opportunity for criticism of the tests and report 
format. With mandated 'tests, allow teachers to 
help construct or declare certain items non- 
relevant as a district. Obviously this only per- 
tains in^a criterion-referenced approach* 

(3) Teachers must be able to interpret test results. 
Unfortunately, central office staff rarely have 
time to do justice to in-service or conferences 
regarding test results. Unless administmtors shift 
priorities to allotting more in-service training 
time to test data analysis, we must rely on better 
teacher training, clearer report formats, and 
teacher desire to self -educate. Building com- 
mittees may help educate teachers with more 
flexibility than district training. 

(4) Increase feedback lo parents and students wi^i 
computerized reports and teacher conferences; 
Purposefulness of any test wanes with delay in 
the return of i^esults. Feedback should be as 
current as possible and in time for decisions 
predicated on test results. Such feedback should 
be expedited by methods discussed elsewhere in 
this Handbook. Pretend, as a test coordinator, 
that you are processing blood samples for 
diagnostic work in a hospital. Schedule testing so 
mails can work for you over weekends and 
holiday^ Pro-correction processing might take 
place on a 20-hour work schedule. Ship results 
by air. The added cost is minor cnmimred to the 
effort and cost expended in testing. 

Of coursOk in-house or cooperative correction, ran 
give the best tnrn-around time on correction. For 
certain tvstinji programs, self-scoring sheets or 
student scoring can he utilized. (laUulators and 
teaehing nuu hines can of fer immediate feedback 
with a record for the tiHieher. 

In Lonfi Branch, we have had the ex^perienre of 
students railing our guidance dcfnirtment during 
. the snmnier to find out results of their Planning 
(Aireer (yoals Test Battery. The Battery rontains 
guidance inforn^tion matrhed with hasir skills 
assessment. ' This group of students has been 
tested as jnniins in the late spring and in spile 
nf a molimtionally diffirult time to test, the 
students exhibited interest herause of the rhanrr 
for prrsonal benefit. However, we were not 
adeifuatelv staffed to respond to these summer 
imfuirh's, 7/ anvlhiiif: is In he learned hy this 
f)rinranre. il is 4o suaiiesl that proi-isuin for 
summer rounselinji should he huilt in as a foUow- 
up to spring testing as a stimulus to inrrease test 



performance. Under current reguiations, such 
follow-up i$ not considered fundable under 
compeumtory eduimtio^* Yet failure to provide 
such follow-up by funding a summer guidance 
counselor probably^willincrease the compensatory 
education load since students see less purpose in 
the test. Relating test scores to career goals (even ' 
if these goals are temporary appeals to many 
students in the upper grades; the appeal is much 
greater than can be aroused through threatening 
more homejvork or a poor program evaluation. 

(5) Although the main purpose of some testing may 
be for program evaluation, stress some personal 
use of test data for the student. Durost's (1974) 
indivlmenl of some Title I tost-taktng strategies 
demonstrates that program evalmtors must be 
sensitive to high guessing and to poorly motivated 
students. As suggested above, using guidance 
rebted test or establishing guidance related norms 
may serve as an effective way to interest students 
inlestrestuh. Currently most commercial norming 
doesnot provideinlereslingnorms for the student. 
A secretarial student may be interested in how 
well he/she does on a grammar section of a test 
in relation to other rommvrcinl students, not in 
relation to all students in large cities or in the 
northeast. On the other hand, the lest coordinator 
wouldwant the more general norm for assessment. 
Planning Career Goals (CTR-MrCraif) HilDallows 
suvh dual comfmrisons. IVhen given the oppor- 
lunily to take vxlra sevlions of the tvsL 50% of 
the students' in ontMitistrivl used their free lime 
for more testing. 

Techniques used In reduce lestiufi lime by 
dividing ilenii^ among sludvnis may h)wvr molivu- 
tion if students see no individual ronsef}uenr%^ 
of taking the test. Surh approaches as matrix 
sampling should he carefully monitored. 

(6) I'rovide guides for student interpretation to n»- 
lieve teachers or guidance counseh)rs of some 
Tnterprelation. The Phuminfi Career Coals Test 
has exemplary student malcrials. \ti>sl other 
haiterivs have Uniiled hand-oul materials. 

(7) l^ovidv instruclionul pn^gram response to demon- 
si rated need. 

(H) Include cerluin resulls t)f standardized i^rdefffirt- 
niental criteriini-referencyd Icsis us' fnirl of a 
sludenrs fsrnde or as extra crediL lunhed certain 
rvahmtion or assessment items in normal ( lass- 
room aclivities. Most test manual dire< lions have 
stressed the reduction of anxielv on the jnirl of 
students. I^ossihlv I he pendulum h^its su un}i lo(^ 
far. many snidcnls lack any hint of anxiety - 



some even sleep during assessment tests! Cer-^ 
tanily some standardized tests (e.g. MLA foreign 
hngu^e teitt, Howell Geometry Test, Miuouri 
College English Test) are valid enough for 
particular courses to v:arrant credit be awarded 
toward a student's grade. 

r 

Variety in the testing program: 

Providing variety '.n the testing program, which a 
student ej.perienees, may have motivational conse- 
quences, although I am not aware of any formal 
studies on this factor. Eyen if the standardization 
programs that generate norms for our tests suffer from 
possible fatigue and boredom (especially "alternate 
form" norming), it is probably not wise to try to 
duplicate such negative factors. Test Coordinators can 
try to avoid examinee boredom by varying the testing 
approach and series. I have heard counselors remark 
more than once, that after four or six years of the 
same test, students just don't try to perform on the 
test -- even if the questions are different (but of the 
same style.) It seems that the title page of the test is 
enough to disuade some students from trying^even 
though the various levels in a series do have different 
questions. In« New Jersey, the minimum skills test 
may provide a break from the yearly administration 
of a test series. Some life-skill tests may be utilized 
to good motivational-end in providing variety in a 
district's test schedule. Measurement of growth on 
one scale might have to be delayed a year or two, but 
this may increase tlie validity of the measurement. 

VVade^Boykin's research at Cornell University 
sliowR cultural differences in reaction to variety in 
test Stimuli. Perhaps unsuccessful students look for- 
ward to trying new test situations in which to prove 
their abilities, whereas successful students look at 
"variety" us a threatening challenge. • 

Teacher enthusiasm: 

If t<!achcrs are confident that they (yir. r<'ach 
a prop-am improvement goal, their enlliusiasm to 
demonstrate such may stimulate students to perform. 
Imf)().ssil»|(! program a( hi(;vement goals may tend to 
dampen teacher, enthusiasm, which leads to a poor 
oricritation of students towar'l the test. The Use of 
s}if)rt-range gt)als with ''front-line workers'* is often a 
ln'ttf-r management technique than revealing long- 
ran<ic goals, or, (wcn wors**, evaluating the worker 
on long-range goal standards. TeaeJuT confidence 
alyo can lie increased by adecjuate pre-lest orientation 
of teachers, even if only -in the area of examiners' 
'inslnictions. \ contused teacher during administration 
of the test also lea<ls to a poor orientation of students 
towarfi the li-st, \ •ujifiiscd teaclier can hardly he 
cxpfc'lcd to show tiujch enlhusiiism for the test 



which contributes to Huch unwanted inseeurity or 
embarassmeiit* 

Assuming that teachers and students desire to 
perform, the next concern for the test-coordinator is 
to allow pertinent skills to be evaluated of assessed 
with reliability and validity. The Concern Is Not*Yo 
Measure Or Compare Non-Relevant Skills. If we 
wish to measure communication skills, let's not 
measure the ability of the students to pace thein- 
s(?lyes on the test or some other test-taking skill. 

Certain criticisms levied against tCK^t-wiscness 
training oft<'n accuse the srhool of *Meaching to I he 
test.'' 1 am not suggesting rc^hearsal of test (|uestions 
appearing verbatim on the test; nor am I suggesting 
(tramming for the content of a test. What 1 am suggesting 
has been well established ui the literature (Eakins 
et.al.,1976:Erickson,1972;Fenton and Mueller, 1977;' 
Ford, 197B; Maryland Department of Education, 
1975i Milhnan et. al. 1965; and Sabers, 1975); 
the teaching and ac(piisition of tesl-wiseness - the 
ability to reliably demonstrate the full extent of one*s 
pertinent skills and knowledges through the medium 
of a mlid test, including the demonstration of mastered 
and [Hirtially developed skills. 

Without such training, many students who fail 
to achi<'ve minimum competency scorers may posst'ss 
satisfactory skills in reading and mathematics. Surh 
failurrs to prove competency ( ontrihute to the overload 
on rrmedfal service's. Special funding is allott<Mi to 
t<'ar[i reading when, for some students, it might be 
hrst to teku*!! test-Nvisrn(>ss, Concern for <M)st-effcrtivr- 
ncss should Jirgt'^drvclnpnunit and impN^mentatinii of 
lest-wiscnrss units and provision for good testing 
environments. The (pirst for valid instruments in the 
mininum? competency movement may be severely 
confounded due to the variability in tcsl-wisencss 
and testing cnviroinncnt (including teacher attitud<\<<). 

Kurd (1973) reports that ''coaching"' (teaching 
conteni area of the lest and craniining) before tests 
has not been shown to raise scores as nnich as tcst-wise- 
lU'ss study which avoided instrurtion in the subjeel 
matter to he tesled (Kakins et. al., 1976). For 
iiisUuice, Harrt)n"s guide How to Prepare for Oollege ' 
Entrance K\arninations(Browiisteinand Weiner, I %9) 
is "iiiorc a coaching book as conipared to llonig 
( Hi7:i), llook (.1962), Huff ( 1961 ) and Millman (196^))/ 
lVi\ate sehnols and some pnl)lic schools mighf^iw-ir 
I 'ord V ci)nclnsions in inind w hen eoiistructing prepara- 
tt)r\ courses and selectio^r apijroprlate materials 
for PS \ r and SAT <'\aminations. 

Willi poorly defincrl domaifis, it is diffictdt to 
ascertain what an item inteu<ls to measure and what is 



germane to test-wiseness; to teach the former is 
coaching (if done specifically for the many domains 
just prior to tbsting) while to teach the latter is 
what concerns us now. Fenton and Mueller (1977) 
point out that to teach to the domain of the test 
is legitimate. If done well in advance of 4 battery 
or long evaluation test, such teaching is the essence of 
the instructional program* No one advocates teaching 
specific items tcTbc found on the test. Sabers (1975) 
emphasi/,es that psychologists, through the American 
Psychological Association, deemed it essential that 
the examinee be given the strategy to maximize his/her 
test score. The following factors are considerations 
when devising a test-wiseness un;t. 

Should a student guess on examinat i ons? • 
Perhaps the most significant and controversial facet 
of test-wiseness pertains to guessing on Inultiple 
choice tests. The author's opinion is that wJL^hould 
not shy from increasing the pace of students or 
urging the use of certain techniques so that all answ4:rs 
are completed, even if this may result in some blatant 
guessing. In fact, some scoring formulas correct for 
leaving questions unanswered by adding to the 
number of right answers, the chance score that 
might have been obtained by answering all unanswered 
({uesticms. This yields a corrected raw score whicli 
is equal to or higher than the number right. If it is 
not pt)ssible t(K alter present correction formulas 
(due to eccmornics or mflexibility of eithcT current 
computer programs or standardized test correction 
procedures), physical alterations of answtT sheets 
can approach the same end. Such (redit for un- 
answered (puvstions gives each examiiu'c hi.s/her maxi- 
mum b<Miefit from test-wiseness in guessing strategies 
as if fie/she were very test-wise. Of course this 
method makes it very ch^ar that assigning meaning to 
raw scores at or below (rhance level is an erroneous 
procedure except when measuring th^ tendency to 
guess. Such a statement is true even when the ''number 
right'" scoring formulas arc uscmI. 

If we lire comparing sturUmts or iu)n-randorn 
groups, as is done by scores on iiorm-rcfcrenced 
tests or with ilcin perfornuuicc results on the New' 
jerse\ Kduciitional Ass'essincnt iVograin tests, it 
woujd behooNc. us to eliniitiate conlaininant ineasurt s 
such as lesl-w isc guessing. (iising credit for unatiswcred 
questions eliininates tlie penally for slndcnt> who 
were loo eaulious, withdrew from competition, were 
rnistiikeidv eauf ioficd tiot to guess, or exhihited 
other unwise lest heha\iors. Students not needing 
to guess or haNing gr(*ater partial knowledge will 
.have inrreased charwes of bcMOg corr<'ct as compared 
to •'pure guessers'J^ and will still obtain^ Idghcr 
<(Mires e\en though the knowledgeable cvaininees 

not hiiNc their scores increased as much b\ the 
ahove* scoring corri'ction. 
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To use nurnbcr-ri^ht correction formulas ignores 
the problem of tin; extraneous guessing variable to 
provide the above correction and places this variable 
at its maximum estimate so that comparative inter- 
pretation can be improved. With eriterion-refereneed 
tests, this maximum estimate can be eonsiden^d a 
floor of performaneclcvcls-^obtainablcwith no relevant 
skill. 

It is extremely diffieult to prevent guessing 
(although providing points for unansw(?red qu<\stions 
retards guessing)* but we ean reach its maximum 
limit by eiutouraging completion of the test or 
through scoring formulas. The limit varies with the 
ability of the examinee in one sense; the higher thu - 
ability, the h\ss guessing will have oecurred. How- 
ever, on multiple elioiee tests, a crTtain score could 
be aeliieved without reading the test and blackening 
the first answer for (*aeh question. This score eould 
be eonsidered the maximum limit of the ehanee 
score. An iudividuaPs maximum guessing limit is 
the (litYerenee between an achieved score reflecting 
the student's actual ability and a s( (ire obtained by 
adding the remaining possible s(;ore points due to 
ehanee on any given assessment instrument. 

If knowledge accounted for part (»f the score 
which miglU hiwv been attrihutc^l to chance, that 
degree of knowledge is of little practical consequence, 
furthermore, oidy non-standard scoring technicpies, 
not available through most test correction services, 
would disliuguisl) between some guessing an'd con- 
fident answers. 

I Itimately, to advise students not to guess 
lca\es the excuniner un<*ertain if some examinees 
actuallv iiucssed. Therefore to reach ihv rnininuun 
extent of guessing is not as reliable nor realistic as 
to reach the uuiximum extent. 

Are There Different Gu(\ssing Strategies For 
CRTs and NRTs ? Crocker and Hcnson (1976) dis- 
tinguish advice luistd on erilerion-refcrcnced tests 
(CRT) and nonn-relVr<*nced tests (NKT) based in 
part on student reaction to the different lest l^jes. 
While it is true that the scoring of erilerion-refercnccd 
tests Van lie more easil\ adapted lo measuring con- 
fidence of answers or giving credit for indicating 
wrong answers (coomhs-lype scoring, .|a<'ohs, 1974). 
the nscs and tin* anxi<M v reactions lo CHTs, and NKTs 
do not se<»m so difjerent to rnc as lo deserv e differcMil 
lest-lakiniz slrat<*«iies. Standards lo he set for CKT> 
might tonsider eqnali/ing points gaincMl through 
chance giicssing hy (insuring comphMion of ail it<'m> 
a}id then acknow Icdt^ing that part of the test score 
could be accomplished U\ random finessing (rnilhrKUi. 
19*;^^). It should be noted that some tests (Mnplov 
a hngh mmd>er of an^\(T options lo n^dnce chance 
levels on >cores. Kdit's Diagnostic Mathematics In- 



ventory (1977), for example, uses seven options, 

\ ArenHThcre Sophisticated Ai^pro 

ing Which Avoid The Guessing Problem ? AT this 

juncture, it should be staled that nonconventional 

approaches to measuring student abilities ean eliminate 

problems of guessing. This paper will not attempt to 

deal with confidence testing (see Brcnnan, 1974 anc| 

Stegman, 1973) or latent trait theory (sec Waller, 

1974), neither of which is generally utilixe<l with 

most stamlardized tests. 

.• 

Do We Ur^ Consistent Strategies With Screening 
and Diagnostic Tests? Rather than distinguish between 
(^RTs and NRTs as to appropriate answer strategy 
it does seem fruitful to distinguish between diagnostic 
arid screening tests. When a student takes a diagnostic 
test, benefit comes from knowing what he/she may 
need to review or learn. Diagnostic tests usually have 
more (piestions on each defined skill. An incorrect 
or missing response has a chance to be confirmed 
or negated by other (piestions. 

In The Cas(! of Diagnostic Tests, stucbuits should 
ru)t be (!nco{^ragcd to guess sin(*e eonsid(Table benefit 
comes from missing questions and therefore getting 
needed review. If diagnostic tests ar(^ used for 
summary data comparisons as in evaluation, the 
proportion of omittt^l answers that might be an:Avered 
by chancer could be i Idcd to the raw score in a 
separate scoring nrocedure. Lord (1975) points out 
that such a formula correlates with traditional ''guess 
correction" formulas which subtract a portion of 
wrong 'ibiswers. Of course, adding fioints for omitted 
items does raise the score in absolute terms. It may 
be nf interest to note that if we instruct stu<lcnts 
that u\\ diagnostic tests ccjrrectioy we will give points 
foromitted answers, gu(\<sing will decreas<' significanti) 
more than if we just instruct studtMits to *'not guess.'' 
(Diamond and Rvans, \97^X 

On Screening Tests, not to be generally used as 
a (iiagnosti<i test, the upper limit ol* a perstmV true 
score (partial knowledge inelnded) can be approached 
b\ urging completion of all quesiions urdess the 
test is speeded. The tendency to f^uess can he eo?i- 
sidered so variahic that comparisons nf norm groufis 
would he more accurate bv at least completing the 
entire lest. This would partiidly counter scores 
poorly motivated students tfiat niighl not beneTit 
from chance answers. It wouhl minimi/c test wiscness 
diiTcrences in large citie> and suhurhan districts 
at least in one major laclor* risk-taking. Durost and 
llodiics" studv of title 1 testing (M>74) showed that 
hiea!i raw siores rose ahout five points on some of 
the I97:i Stanford \chicvcm(Mit •^Heading" te^ts: 
howevcT, sex dilferenees in pertormafice e\i>ted. 
StandarH dcviatitms also rose in snnie e;ises, alfhoiMih 
reliabilitv (d' test scores tcrid(Ml to increase with 



guessing in other experiments (Diamond and Evans" 
1973 and Rowley and Raub, 1977). No clear * 
, empirical d«to' of which the Author i« aware shows 
"that increased guessing would make it harder to 
find significant gains in evaluation studies. 

Rowley and Traub (1977) concluded that exam- 
inecss having risk-taking personalities benefit from **dn 
. noi guess" directions, «nd that most examinees 
cannot distinguish pure guessing from informed 
guessing. Numbers of studies have shown that scores 
are higher when guessing or faster pacing is encouraged. 
Wh^n left to their own test taking strategics,, the 
personality characteristics of students arc strong 
influences over guessing (Cross and Frary, 1976; 
Diamond and Evans 1973; Durost and Hodges 1974: 
Ford, 1973; Rowley and Traub, 1977; and, Sherman, 
1976). The tendency to guess (or not to respond 
when any doubt exists) is probably related to 
socio-economicclassandseemstobe strongly associated 
with groups of minority students (Sherman, 1976). 
It should be noted that Maryland's attempt to teach 
test-wiscness arose from a suggestion from' an advisory 
t;pmmittee on minority relations in Montgomery 
County. 

Should We Urge Guessing When Correction 
Formulas Are Applied? Almost ail references agree 
that even with correction formulas being applied for 
guessing, a person who can eliminate one wrong 
answer option from the remaining is legitimately 
raising his/her score by guessing. When correction 
formulas are used (as with College Entrance Examina- 
tion Board Exams, but rarely with batteries used by 
distriets), blind guessing (blackening in the spaces) 
" wastes time -- time whieb might be better uscul in 
reasoning a difficult question. For groups tested 
with e<irfection formulas for guessing, averagi s rarely 
go down even with blind guessing. Scores will 
obviously not go lower if a number right scoring 
.system is used. Furthermore, Rowley and Trauli 
(1977) present thesis data using ninth gra(i(irs which 
demonstrate thai 4\% of answers claimed to be non- 
infjirnHMl gu<!ssing were corn et guesses, when 2Xf 
w<iuld have been at the chance l«;vel. Tliey coik !u<le 
that students cannot distinguish belw<'en informed 
ajid blatantly random guessing. This suggests tliat 
misjuiigment on the part ol" a stuilent does no harm 
wIk'II it results in guessing when correction formulas arc 
used- S«;vere harm is done when a student neglp ts 
to giifvs on a test ••orreeted l>> the number right 
formula. One should not urge blatant guessin-i wlieri 
guessing citrrectiofi formulas are applied. llovvc\er 
not much is lost if the student "over-gcn«'t:»l»/.es*' 
from thi,' normal t)pe of sc hool e\am. 

Where Ciin 1 Fiml Additional l)iscussi(;n on the 
C.uosing Question? Opponents tm-m-ouraging guessing 
and Duros ami Hodges (1974), Lord (1»):."). an<l 

ERIC 



Sherman (1976). The reader may wish to* consult 
these references prior to establishing a test-wiseness 
program. Thcww advocating guessingj^ especially with 
numbcf-right correction formulas and particu^rly 
with the possibility to eliminate one wrong answer, 
include Ford (1973) and Rowley and Traub (1977). . 
All advocate tcst-wiseness instruction. 

♦ 

What Might One Conclude From The Ideas f' 
Presented In This Paper? In summary, on assessment ^ 
and cvahiation tests, completion oi the test does not 
seem to harm evaluative, group comparisons and may 
eliminate some variance due to test-wiseness. If 
8( oringformulas that add **chance points to raw scores'' 
cannot be found, mechanical blackening in of answer 
sheets can accomplish the same adjustment. Such an 
adjustment at least puts all students in comparison 
groups on the same footing in regard to guessing. 
Differences In scores ^dll then be more heavily do- 
pendent on the variable intended to be measured. The 
standard deviation of group scores might be reduced, 
whereas the effect on individual scores is contro- 
versial. Empirical studies do not seem to confirm 
theoretical models which disregard "educated guessing. 

T)urost and Hodges' (1974) research deserves ^ 
scrutiny beyond this paper since it contains data 
quite generalizable to needs assessment operations in 
New Jersey and also to an experimentaP variable 
encouraging students to complete the screening 
instrument. Insightful comments on their resear(!h 
rrveal that test-wise behavior was la(!king in many 
New Hampshire* youtlis, with eautions stated about 
the meaningfulness of much of th(? test dyta. The 
pa[)<TV data may need reworking before the reader 
is willing to accept Durost and Hodges" conclusion 
that att< mpts to eliminate gut ssing would yield mow 
information on such ncm-diagnostir test.s. Their 
thoughts on criterion-referenced item ^analysis and 
the placement of items on a test in order of difficulty 
should be can fully evaluated. They conc lude, as do 
many t(st manufarturcrs, that mathematical cor- 
n^rtion for gu(\ssing has no benefit in redunng 
«nn*.^^in}i or rrorchTiiig stu(i(*nts in performanc»(! rank. 
It dors ac complish Iowct m'utvs in an absohitc* sc^nso, 
hut maN also disc riminalc* afiaiiisl t rrtaiii personality 
types. 

(I) Students Musf He Taiifihl To Paw. There 
appears to he sonir evidenvo that ''spved rradinfi'' a 
test is a test-wise apprtmvh U> demo ml rating abilities 
to he measured. Miller and 11 eiss (I97h) found 
providiufi time limits on diffieull items i>n tests did, 
not reduce arruraew hut did inerese test taking 
speed. The Maryland test-wiseness syllahus (1975}^ 
trains students in adjustin^i pave aeeordin*: to the 
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type of $ub4€$t (also see Hallberg, 1971). For non- 
fiction rcQding rt(?ms, it recpmmends reading the 
iquettion and ieanning for the antwer^ Sq we s#e 
■^fhAtiitpproaeh tart even vary by Urn type (Maryhnd 
State Department of Education, 1975), much as 
^regular reading or text-studying behaviors might 
"^adjusted for the type of book being read. 

Students shmild be urged not to become dis- 
turbed if they cannoi answer every item with 
definite confidence (Sabers, 1975). A student willing 
to abandon certain questions can speed on to more 
questions. Sffatial relatiohs tests are an excellent 
example of tests on whirha persdhcan become bogged 
down on certain questions. It is not unknown to ex- 
pvrience a change of 70 percentile ranks when taking 
suvh a lest a second time and pacing oneself in a 
more cursory manner. _ 

Ttvo nvrrssary ingredients for pacinr ructions 
are practice and being able to have an cxlernal time 
i h*>ck. Classroom practice and yearly examinfitions 
can provide test-wLseness to varying degrees in K''I2^ 
curricula. I^actice on certain forms of items which 
will he encountered on a test does influence pacing 
more than counting on generalization. Possibly, how- 
ever, even SAT scores nuiy eventually begin to rise as 
students are provided more testing experiences under 
general accountability pressures. 

The second factor in fmcing is being able lo see 
some timing device and to calculate time intervals. 
The procedure of writing times on the chalkboard 
or announcing time left is thought by some to he 
loo disturbing (e.g. 26). A silent clock in the room, 
viewable by all, with finish time written on the board 
(by diagram for youngsters who can't tell time) is a 
^ good solution. It shouUi be noted that many children 
cannot afford wrist watches and their schools may not 
have operable clocks. Are normative or evaluative 
comfHirifions not made inwlid by such problems 
associated with socio-economic conditionsY 

The rctvnt trend to digital chwks may require 
that test coordinators supphment time pieces in 
testing rooms. J)epending upon what fraction of a 
minute is viewed on a digital or impulse clock, 
an examinee's ''minute'' check may he over in a few 
sec4)nds. hurlhermore. many digital clock faces are 
less vi.sible to students than ckssicai sweep soanul 
tvnU clocks. 

(2) Si^ne Students are Aware of Item Con- 
struction Cues. Certain groups of teachersand students 
may perpetuated naivete on item writing cues hecanse 
no i^ne ever bothered teaching them how to take a 
test or how to write items. Duunondet. at. (1976) and 
MillmaA et. al. (1965) deal with test-wisencss 



construction. Onemay wonder how test constructors 
can cohimit item writing errors so blatant that such test* 
wumeis instruction can he beneficial Di$trict$ should, 
carefully assess the worth of tiMchiiig h(>w to ipot 
incorrect or correct answers by such factors as length 
of option^ matched graphemes in stem and answer 
and use of ungrammatical alter^jjatives. It is true that 
some test publishers and many teachers do still 
commit such errors* 

(3) Provide Psychomotor Practice for Some Test 
Answer Sheets and Students. With some answer sheetSt 
young students can benefit from practice in recording 
score (Sabers^ 1975). Some test-^wiseness programs 
provide **answer entry^* practice for several days 
prior to testing (Maryland State Department of 
Education, 1975). Test coordinators should choose 
answer recording format and correction services 
with care. Extra money \%pent in assessing can save 
unnecessary remedial expenditures. 

Some have discovered that the rewriting of 
horizontal math problems poses a test of small 
muscle coordimtioit rather timn math competency. 
If conclusions drawn from assessments might be 
connected with psychomotor preixiration, notations 
during testing or during test-wiseness preparation 
could be used during item analysis to increase the 
appropriateness of instruction. Hopefully, test'Wi.%eness 
instruction could have a psychomotor component 
to increase the validity of screening tests following 
such instruction. 

(4) l^actice With Format of Items and Allow 
Students to cultivate Familiarity with Directions. Thp 
student who understands directions w Aland does not 
have to refer lo directions during a timed test has a de- 
finite advantage over the student unfamiliar with item 

. format and the directions. Some types of questions 
are so involved that weeks of practice are needed. 
Practice embedded in the normal instructional routine 
is more efficient than specific units in format 
practice (Kakins, et at.. 1976; Ford, 1973: and, 
Marylnnd Slate Defxirtment of Education, 1975). 

(fi) I'r^c .Students lo (Mvrk Answers. Although 
unsure guesses nmy generally he correct iin the first 
try. leasoned anstvers have proved more correr) when 
citrrected or checked once, (.ontiniml chnujie of 
answers is danfsenms. Ford (1973) warns against 
piHidering over a question at length. 

(6) demonstrate That Eliminating at Least One 
Answer Helps liaise .Scores. F(nd ( 1973) rect^inmends 
readin*i all (Upturns before deciding on <nie option. 
If time IK limited, locating at least one wntng option 
and then clioifsing what looks like a flood answer may 
prove more efficient. Once a^iain. blank answers or 
'\l(nil kmnvs'' arc not useful in ihe author\s opinion. 



Reaction To This Paper 

Nof all existing progt^mgli not all pertinent 
journal articles, have been reviewed for this paper. 
It is hoped that readeVs will reply with added sugges- 
tions for further development of issues discussed here, 
A-4t^wiseness program developed with suggestions 
contained in this paper will do no harm to student 
scores, nor will it confound evaluation pYacticcs. We 
will, however, make an effort tocbntinuedisscmination 
of , possible improvements to tests-wiscness programs. 

Wr hope some response is forthcoming from 
lest publishers to concerns expressed in {\w pap(!r, • 
V^»c iVi//v in revising test correction formulas so 
ds miglil seici;t appropriate techniques. 

Our aim in suggesting test•wisent^ss <H)urscs is to 
improvf tlic measurement process, not <lcstroy it. 
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